Time Series Analysis homework 0. Write a guide for your classmates on how to tell if a process is an AR process or an MA process, how to figure out its order (e.g. AR(1) vs AR(4)), and what kinds of effects nonstationarity can have on such an analysis. Include paragraphs, formulas, flowcharts, example real data sets, example artificial data sets that you construct, plots, etc. as needed/useful. 1. The Keeling Curve is a famous (to geeks) data set on the amount of carbon dioxide in the atmosphere, from 1958 to now. Read http://en.wikipedia.org/wiki/Keeling_Curve Take the Keeling Curve data at ftp://ftp.cmdl.noaa.gov/ccg/co2/trends/co2_mm_mlo.txt Use the "Interpolated" column (the second column that has numbers like 315.71) a) What function form (with undetermined parameters) will be a good model for the data? Set 1958.0 as time 0. Your function might look like these deliberately wrong examples: f(t) = a*log(t) + b*t + c f(t) = a*tan(omega*t) + sqrt(t/c) b) Choose parameters for your function that fit the data. Use all but the most recent year's worth of data. For example, if you're doing this homework in January 2009, use data up through Dec. 2007. c) For the year you left out of part (b), predict the value for each month of that year, and compare to the actual values in the data set. For example (as above), predict Jan. 2008 through Dec. 2008. d) Choose parameters for your function that fit the data, but leaving out the most recent 10 years of data. Predict the most recent year using those parameters. For example (as above), use data up through Dec. 1997, then predict for 2008. Compare to the actual data. -------------------------------------------- 2. Give an update on your project thoughts. Project proposals are due Friday, Feb. 6th 2009; Projects are due Friday, Feb. 20th 2009. (the day before break week) Project presentations start Tuesday Mar. 3rd. -------------------------------------------- 3. Consider the weather data given in the provided file. a) What function form (with undetermined parameters) will be a good model for the data? Set 1958.0 as time 0. Your function might look like these deliberately wrong examples: f(t) = a*log(t) + b*t + c f(t) = a*tan(omega*t) + sqrt(t/c) b) Choose parameters for your function that fit the data. Use all the data. c) Discuss the patterns you found. When is the yearly peak? When is the daily peak? d) Examine the residuals. Are they an AR process (what order?) Are they an MA process (what order?) Choose one of these three refinements to do: e) Use the residuals to discuss the phrase "If you don't like the weather, wait 5 (or 10? 15?) minutes--it will change". f) Include a once-a-week cycle to see if human weekday/weekend activity is influencing the weather. Discuss your results. g) Include a twice-a-year cycle and a twice-a-day cycle. Fit the parameters to the data again, and discuss how important these faster cycles are. -------------------------------------------- 4. Suppose you have a sine wave f(t) = sin(omega * 2*pi*t) for varying values of the frequency omega, from omega=1 per second to omega=20,000 per second. You sample every 1/44100th of a second (CD-quality audio). a) Suppose you apply a running-mean filter of size 3 samples. (yes, it is somewhat silly to apply a filter to a sample with no noise, but keeping noise out does simplify things.) Plot the amount of damping as a function of frequency. b) Repeat for a filter size of 5 samples, etc. c) (optional) Can you find a general expression for damping as a function of frequency and filter size?